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Abstract. A causal rate distortion function (RDF) is defined, existence of extremum solution 
is described via weak* -convergence, and its relation to filtering theory is discussed. The relation to 
filtering is obtained via a causal constraint imposed on the reconstruction kernel to be realizable 
while the extremum solution is given for the stationary case. 
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1. Introduction. Shannon's information theory for reliable communication evo- 
' ' . Ived over the years without much emphasis on real-time realizability or causality im- 

posed on the communication sub-systems. In particular, the classical rate distortion 
^ i function (RDF) for source data compression deals with the characterization of the op- 

^-H I timal reconstruction conditional distribution subject to a fidelity criterion , without 

l~H . regard for realizability. Hence, coding schemes which achieve the RDF are not realiz- 

c/3 I able. 

1^1 . On the other hand, filtering theory is developed by imposing real-time realizability 

on estimators with respect to measurement data. Although, both reliable communi- 
. cation and filtering (state estimation for control) are concerned with reconstruction 

' of processes, the main underlying assumptions characterizing them are different. 

• In this paper, the intersection of rate distortion function (RDF) and realizable 

' filtering theory is established by invoking the additional assumption that the recon- 

■ struction kernel is realizable via causal operations, while the optimal causal recon- 
| structiou kernel is derived. Consequently, the connection between causal RDF, its 
. characterization via the optimal reconstruction kernel, and realizable filtering the- 
' ory are established under very general conditions on the source (including Markov 

\ I . sources) . The fundamental advantage of the new filtering approach based on causal 

^ ' RDF, is the ability to ensure average or probabilistic bounds on the estimation error, 

which is a non-trivial task when dealing with Bayesian filtering techniques. 

The first relation between information theory and filtering via distortion rate 

■ function is discussed by R. S. Bucy in [2], by carrying out the computation of a 
realizable distortion rate function with square criteria for two samples of the Ornstein- 
Uhlenbeck process. The earlier work of A. K. Gorbunov and M. S. Pinsker [J on e- 
entropy defined via a causal constraint on the reproduction distribution of the RDF, 
although not directly related to the realizability question pursued by Bucy, computes 
the causal RDF for stationary Gaussian processes via power spectral densities. The 
realizability constraints imposed on the reproduction conditional distribution in [2] 
and iTj are different. The actual computation of the distortion rate or RDF in these 
works is based on the Gaussianity of the process, while no general theory is developed 
to handle arbitrary processes. 



X 



*Ph.D. student at ECE Department, University of Cyprus, Green Park, Aglantzias 91, P.O. Box 
20537, 1687, Nicosia, Cyprus (photios . stavrouOucy . ac . cy). 

tProfessor at ECE Department, University of Cyprus, Green Park, Aglantzias 91, P.O. Box 20537, 
1687, Nicosia, Cyprus (chadchaSucy . ac . cy). 

1 



2 



P. A. STAVROU AND C. D. CHARALAMBOUS 



The main results described are the fohowing. 

1) Existence of the causal RDF using the topology of weak*-convergence. 

2) Closed form expression of the optimal reconstruction conditional distribution 
for stationary processes, which is realizable via causal operations. 

3) Realization procedure of the filter based on the causal RDF. 

Next, we give a high level discussion on Bayesian filtering theory and we present some 
aspects of the problem and results pursued in this paper. Consider a discrete-time 
process X" ^ {Xq, Xi, . . . , X„} e Xo^n — and its reconstruction F" = 

{Yq,Yi, . . . ,Yn} S 3^0, n — XiLo^i: whcrc Xi and yi are Polish spaces (complete 
separable metric spaces). The objective is to reconstruct X" by Y" causally subject 
to a distortion or fidelity criterion. 

In classical filtering, one is given a mathematical model that generates the process 
{Pxi\X'-^{dxi\x'^~^) : i = 0,1,..., n} often induced via discrete-time recursive 
dynamics, a mathematical model that generates observed data obtained from sensors, 
say, Z", {Pzi\Z'~^,X' {dzi\z^~^ , x"^) : i = 0,1, . . . ,n} while F" are the causal estimates 
of some function of the process X" based on the observed data Z". Thus, in classical 
filtering theory both models which generate the unobserved and observed processes, 
and Z", respectively, are given a priori. Fig. 11.11 illustrates the cascade block 
diagram of the filtering problem. 
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Fig. 1.1. Block Diagram of Filtering Problem 

In causal rate distortion theory one is given the process X", which induces 
{Pxi\X'-^ {dxi\x^~^) : i — 0,1, ... , n}, and determines the causal reconstruction con- 
ditional distribution {Py-.i^i-i x»(rfyi|y*~^,2;*) : i — 0,1, which minimizes the 
mutual information between X" and Y" subject to a distortion or fidelity constraint, 
via a causal (realizability) constraint. The filter {Yi : i = 0, 1, . . . , n} of {Xi : i = 
0, 1, . . . , n} is found by realizing the reconstruction distribution {Py^iY^-^ ,X'id'yi\y^~^ , 
x') : i — 0,1, . . . ,n} via a cascade of sub-systems as shown in Fig. 11.21 
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Fig. 1.2. Block Diagram of Filtering via Causal Rate Distortion Function 

The distortion function or fidelity constraint between x" and its reconstruction 
y", is a measurable function defined by 

n 

do,n ■■ Xo,n X ya^n ^ [0,Oo], do,n(x", J/") ^ po,tix\ V') 



The mutual information between X" and F", for a given distribution Px^idx"^), and 
conditional distribution Pyn|x" is defined by 



(l.l)/(X";y")^ / log( 



PY.\xAdy"\x^) 
PyAdy'') 



Py^^lxAdylxn <E> PxAdxn 
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Define the (n + 1)— fold causal convolution measure 

(1.2) 'Py.\x-{dy^\^^)^®\'^^PY^\Y^-lMdy^W-\^')-a■S■ 
The realizability constraint for a causal filter is defined by 

(1.3) ^,^^[PY^\xAdy^\xn-- Pr.|;f.(dy"K) = 7^^„|;f„(dy"K)-a.s.} 

The realizability condition (|1.3p is necessary, otherwise the connection between filter- 
ing and realizable rate distortion theory cannot be established. This is due to the fact 
that Pyn|jfn(dy" |x") = ®f=o-^>i|v*-i.Jf" ('^2/i|y*~^; 2;") — a.s., and hence in general, 
for each i = 0, 1, . . . , n, the conditional distribution of Yi depends on future symbols 
{Xij^i, • • • , Xn] in addition to the past and present symbols {F*"^, X'}. 

Causal Rate Distortion Function. The causal RDF is defined by 

(1.4) i?g,„(l?)^ inf /(X";y") 

Note that realizability condition (|1.3p is different from the realizability condition in 
[2], which is defined under the assumption that Yi is independent of X*^^ ^ Xj — 

E^Xj jX*^ , j = I + 1, z + 2, . . . ,. The claim here is that realizability condition (II. 3p 
is more natural and applies to processes which are not necessarily Gaussian having 
square error distortion function. Realizability condition (jl.3|) is weaker than the 
causality condition in defined by O X"^ <~> forms a Markov chain. 

The point to be made regarding (II. 4p is that (see also Lemma [273]) : 

Py„|x-.(d2;"|x") = 7^y,.|^„(d2;"|x") -a.s.^ 

log ( "^Pylidyn) PY'^lxAdy-lx^Px^dx-) ^ l{Px^,l^Y-\X-) 

where I(Px" , i^yn|xi) points out the functional dependence of /(X";y") on {Px", 

The paper is organized as follows. Section [2] discusses the formulation on abstract 
spaces. Section [3] establishes existence of optimal minimizing kernel, and Section |4] 
derives the stationary solution. Section [5] describes the realization of causal RDF. 
Throughout the manuscript proofs are omitted due to space limitation. 

2. Problem Formulation. Let N" = {0, 1, . . . , n}, n G N = {0, 1,2,.. .}. The 
source and reconstruction alphabets, respectively, are sequences of Polish spaces 
{Xt : i £ N} and {yt : t G N}, associated with their corresponding measurable 
spaces {Xt, B{Xt)) and (^(,^(3^4)), t e N. Sequences of alphabets are identified 
with the product spaces {Xo,^, B{XQ,n)) = >^l=oi'^k, B{Xk)), and (3^o,«, S(3^o,„)) = 
x^^Q{yk,B{yk))- The source and reconstruction are processes denoted by X" = 
{Xt -.t e N"}, X : W X n ^ Xt, and by F" = {Ft : i e N"}, Y : x Q ^ yt, 
respectively. Probability measures on any measurable space {Z,B{Z)) are denoted 
by Mi{Z). It is assumed that the cr-algebras a{X~^} = a{Y~^} = {0,r2}. 

Definition 2.1. Let {X,B{X)),{y,B{y)) be measurable spaces in which y is a 
Polish Space. A stochastic kernel on y given X is a mapping q : B{y) x A" — >■ [0, 1] 
satisfying the following two properties: 
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1) For every x <E X , the set function q{-;x) is a probability measure (possibly 
finitely additive) on B{y). 

2) For every F e B{y), the function q{F; •) is B{X) -measurable. 
The set of all such stochastic Kernels is denoted by Q{y; X). 

Definition 2.2. Given measurable spaces {Xo_„,B{Xo_n)), (3^o,n, ^(3^o,n)), then 

1) A Non-Causal Data Compression Channel is a stochastic kernel go,n(<^y"; x^) e 
Q{yo,n', Xo^n) which admits a factorization into a non-causal sequence 

qo,nidy'";x") = ®7=oq^idy^■,y''\x") 

where qi{dyi-,y'-^,x'^) & 2(3^1; J^o,»-i x A'o,„),i = 0, . . . ,n, n e N. 

2) A Causally Restricted Data Compression Channel is a stochastic kernel qo,n{dy^ 
;a;") S Q{yo,n', Xo^n) which admits a factorization into a causal sequence 

qo,nidy'^;x") = ®i=oqi{dyi;y'~'^,x'-) - a.s., 

where qi G 2(3^^; 3^o,i-i x XQA),i = 0, . . . ,n, n G N. 

2.1. Causal Rate Distortion Function. In this subsection the causal RDF is 
defined. Given a source probability measure /Uo,n € A^i('-fo,n) (possibly finite addi- 
tive) and a reconstruction Kernel go,™ G Q{yo,n', -^o.n), one can define three probability 
measures as follows. 

(PI): The joint measure Po,n G Mi{yo,n x -^o.n): 

Po,n{Go,n) — (MO ){Go.n), Go,n e B{Xo,n) X ^(J^o.n) 

qo,n {Go ;a;")A<o,n(rfa^") 



where Go.n.x" is the x"— section of Go.n at point x" defined by Go^n,x'^ — {y" G yo,n '■ 
(a;",y") € Go,it} and (g) denotes the convolution. 
(P2): The marginal measure J/o.n G ^^1(3^0,71): 

t'0,n(-fo,ra) — Po,n('^0,n X Fo,n), -Po,ri € ^(3^0, n) 

= / qo,n{i^O,n >^ Fo,n)x";x'^')lJ.O,n{dx") = qo,n{Po,n; x")lJ.o^n{dx") 

(P3): The product measure 7ro,„ : B{Xo^n) x B(yo,n) >->■ [0,1] of /^o.n G A^i(<^o,n) 
and i/o,n G Mi{yo,„) for Go,„ G S(A'o,„) x S(>'o,n): 

7rO,n(Go,n) — {l^O,n X l'0,n)(Go,7i) = / '^0,n{Go,n,x'')l^O,n{dx'^) 

The precise definition of mutual information between two sequences of Random Vari- 
ables X" and Y", denoted I{X"; Y") is defined via the KuUback-Leiblcr distance (or 
relative entropy) between the joint probability distribution of (X", F") and the prod- 
uct of its marginal probability distributions of X" and F", using the Radon-Nikodym 
derivative. Hence, by the chain rule of relative entropy: 

7(X";y")4p(Po,„|Kn)= / log( j^"'"^'"'"h d(/Xo,„^go,n) 



-L 



(2.1) = / D((7o,n(-;a;")||^0,n(-))MO,n(c«a;") =I(A<o,n,«0,n) 
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The next lemma relates causal product reconstruction kernels and conditional 
independence. 

Lemma 2.3. The following are equivalent for each n G N. 

1) Qo,n{dy^^;x^^) = "^q x")-a.s., defined in Definition \2.2\ -2). 

2) For each i = 0, 1, . . . , n - 1, o {X\ F*"^) ^ (X,;+i,Xi+2, • • • ,Xn), forms 
a Markov chain. 

3) For each i = 0, 1, . . . , n — 1, o X''- <-> X^+i forms a Markov chain. 
According to Lemma [^751 for causally restricted kernels 

(2.2) =I(Mo,n, g'cJ 

where p.2p states that I{X"; F") is a functional of {/io.n, nl- Hence, causal RDF 
is defined by optimizing l{no,n,qo,n) over qo,n&Qo,niD) where Qo,niD) = {go,n G 
Qiyo,n;Xo,„) : /^^^/y^^do,™(a;",j/"')go,™(d2/";a;") (g) ^o,«(da;") < D} subject to the 
realizability constraint 90,71(^2/"; a:") = '^o,n('^y"! x")— a.s., which satisfies a distortion 
constraint, or via p.2p . 

Definition 2.4. (Causal Rate Distortion Function) Suppose do.nC^;", y") — 
I]"=o'°o, j(a;',y')^ i^^ere pos ■ x 3^o,j [0,cx)), is a sequence of B{Xos) x t3{yo^i)- 
measurable distortion functions, and let ^q„(_D) (assuming is non-empty) denotes 
the average distortion or fidelity constraint defined by 

The causal RDF associated with the causally restricted kernel is defined by 

(2.3) %0,n,90,„) 



3. Existence of Optimal Causal Reconstruction Kernel. In this section, 
appropriate topologies and function spaces are introduced and existence of the mini- 
mizing causal product kernel in (|2.3p is shown. 

3.1. Abstract Spaces. Let BC{yo,n) denote the vector space of bounded con- 
tinuous real valued functions defined on the Polish space 3^0. n- Furnished with the 
sup norm topology, this is a Banach space. The topological dual of -BC(3^o.n) de- 
noted by (j3C{yo^n)j is isometrically isomorphic to the Banach space of finitely 

additive regular bounded signed measures on 3^o,n [S]j denoted by Mrbaiyo.n)- Let 
^rba{yo,n) C Afr6a(3^o,n) dcuotc the sct of regular bounded finitely additive probabil- 
ity measures on 3^o,n- Clearly if J^o,™ is compact, then ^i?C(3^o,n)^ will be isometri- 
cally isomorphic to the space of countably additive signed measures, as in |4|. Denote 
by Li{^Q^n, BC{yo^n)) the space of all /io,n-integrable functions defined on Ao^„ with 
values in BC{yo.n), so that for each e ii(/io,ni SC(yo,n)) its norm is defined by 
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The norm topology || ^, makes Li{fia n, BC{yo.n)) a Banach space, and it follows 
from the theory of "lifting" |B] that the dual of this space is i^(Aio,n, -^r&a(3^o,n)), 
denoting the space of all Mr^a (3^o,n) valued functions {q} which are weak*-measurable 
in the sense that for each cj) e -BC(3^o,n): a;" — ?• qx^ {(j)) = /-y^ (f){y"')q{dy"-] a;") is /io,n- 
measurable and /^o,n-essentially bounded. 

3.2. Weak*-Compactness and Existence. Define an admissible set of stochas- 
tic kernels associated with classical RDF by 

Qad — iJ^(MO,ri, nr6a(3^0,n)) C (A*0,n , ^^rba (!Vo,n)) 

Clearly, Qad is a unit sphere in L^(^o,n, M^baO'o^n))- For each (f>eLi{iJ,o^n, BC{yo,n)) 
we can define a linear functional on L'^{po^n, Mrbaiyo.n)) by 

Uiqo.n)^ f (f cf>{x",y")qo,n{dy";x"))fio.n{dxn 

This is a bounded, linear and weak*-continuous functional on iJJ^(/io,Ti, Afr&a(3^o,n))- 
For (io,n : -^Cn ^ 3^o,n ^ [0, oo) measurable and do,n^Li{^o,n, BC{yo,n)) the 

distortion constraint set of the classical RDF is Qo,n{D) = {q^Qad ■ C-do „{lo,n)<D}. 
Lemma 3.1. For ido.n&Lii^'-o.n, BC{yo^n)), the set Qo,n{D) "is weak* -bounded 

and weak* -closed subset of Qad- 

Hence Qo,n(-D) is weak*-compact (compactness of Qad follows from Alaoglu's 

Theorem 0). ' 

Lemma 3.2. Let Ao.„,yo.n be two Polish spaces and do.n '■ ^o,n x 3^0, n ^ [0,oo], 
a measurable, nonnegative, extended real valued function, such that for a fixed x" G 
'^o,n, y" d{x"-, •) is continuous on yo^n, for iiQ^n-almost all x" G '^o,n, and do^n G 
Li{p-o,m BC{yo^n))- For any D £ [0,cxd), introduce the set 

QoAD) ^ {qo.n e Qad ■■ I (I do,n(x",2/")go.n(dy";x")Vo.„(dx") <i?} 

and suppose it is nonempty. 

Then Qo^D) is a weak* -closed subset of Qad o,nd hence weak* -compact. 

Next, we define the realizability constraint via causally restricted kernels as follows 

~(^ad = {^O^n e Qad ■ ^O^n (^2/" ; Sj" ) = ~t ^ ^dy"- ; x" ) - fl.S.j 

which satisfy an average distortion function as follows: 

= {9o,„ e ^ad ■■ Wr. (to J = ( ^ d^^x^, y^)ltoJdy-; 

(g) fIo,n{dx"')] 

The following is assumed. 

Assumption 3.3. Let Xo,n o,nd 3^0, n be Polish spaces and ad weak* -closed. 

Remark 3.4. The conditions 1) 3^0, n is a compact Polish space, and 2) for all 
h{-)eBC{yn), the function (a;",j/"-i) e ';fo,n x ^o,™-! ^ Jy^h{y)qn{dy;y''-\x") G 
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M is continuous jointly in the variables (a;",y" ^) G Xq ^ x 3^o,n-i sufficient for 
ad weak* -closed. 

Theorem 3.5. Suppose Assumvtion \3.3\ and the conditions of Lemma lS.Si hold. 
For any D G [0,oo), introduce the set 

QoAD)^{qo.n&~^ad- f (f d{x'',y^)lto,n{dy^;^")]l^o.n{dx^)<D} 

and suppose it is nonempty. 

Then ~Qq^{D) is a weak* -closed subset of'^^d ^'^'^ hence weak* -compact. 

Theorem 3.6. Under Theorem \3.5[ ^{D) has a minimum. 

Proof. Follows from weak*-compactness of ad ^'^d lower semicontinuity of 
I(/io,n, 9o,n) with respect to go,n for a fixed /io,n- D 

4. Necessary Conditions of Optimality of Causal Rate Distortion Func- 
tion. In this section the form of the optimal causal product reconstruction kernels is 
derived under a stationarity assumption. The method is based on calculus of varia- 
tions on the space of measures [9]. 

Assumption 4.1. The family of measures ~^Q n{dy'^;x'"-) = <8f^oqi{dyi;y^~^ ,x^)- 
a.s., is the convolution of stationary conditional distributions. 

Assumption 14.11 holds for stationary process {{Xi,Yi) : i G N} and po_i{x^,y^) = 
p{T''x"' ,T^y"'), where T'x" is the shift operator on a;". Utilizing Assumption 14. li 
which holds for stationary processes and a single letter distortion function, the Gateaux 
differential of I(/io,ni 'Zo.n) is done in only one direction (since qi{dyi\y^~^ ,x'^) are 
stationary) . 

The constrained problem defined by (12. 3p can be reformulated using Lagrange 
multipliers as follows (equivalence of constrained and unconstrained problems follows 
similarly as in 9 ). 

(4.1) Rln{D)= inf {l(MO,n,gO,n)-s(V„('ZO,„)-i?)} 

ad 

and s G (— oo, 0] is the Lagrange multiplier. 

Note that ad is a proper subset of the vector space L^(^o,n, Mrba{yQ,n)) which rep- 
resent the realizability constraint. Therefore, one should introduce another set of La- 
grange multipliers to obtain an optimization on the vector space iJ^(/io,n, Mrba(yQ,n)) 
without constraints. 

Theorem 4.2. Suppose do.n(a;",y") = Y]i=o PiT'x",TY') and the Assump- 
tion \3.3\ holds. The infimum in (|4.1I) is attained at Qq ^ G L'^{fio_n,^rba{yQ,n)) <^'^ ad 
given by 

(zS,„(d2/";x")-^S,„('^2/";^")-a-5- 

(4.2) =^'l^f^q*{dyf,y''\x')-a.s 

^ e^piT^--,T^vn^*^dy,-y^-i) 

°Iy, e^P(^'-".^-^")^*(dy.;y-i)' " 
and v*{dyi\y^~^) G Q(3^i; 3^o,j-i)- The causal RDF is given by 

R^o,n{D)- [ f / e^'^^'^"'^'y''K:idyf,y^-')) 

^^^^ i = QJXo,^Xyo,^^l \Jy^ J 
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IfR^ „(!)) > then s < and 



J2 / p{T^x",T^ynto^dy';^')^^oAd^l^D 

2=0 ■^•^O.i •'yo,i 



Remark 4.3. Note that if the distortion function satisfies p{T^x^^ ,T^y"') = 
p{xi,T^y'^) then q* {dyi;y''~^ , x'') = q*{dyi;y^'^ ,Xi) - a.s., i G W\ that is, the re- 
construction kernel is Markov in X". 



5. Realization of Causal Rate Distortion Function. Fig. 15.11 illustrates a 
cascade of sub-systems which realizes the causal RDF. This is called source-channel 
matching in information theory .6: . It is also described in [3] and [TT and is essential 
in control applications since this technique allows us to design encoding/decoding 
schemes without delays. 



Source 




Encoder 




" ( 






Decoder 


► 

A 


► 
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Fig. 5.1. Block Diagram of Realizable Causal Rate Distortion Function 

Examples to illustrate the concepts can be found in [3l [10] . 
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